Combining Thesaurus Knowledge and Probabilistic Topic Models

نویسندگان

  • Natalia V. Loukachevitch
  • Michael Nokel
  • Kirill Ivanov
چکیده

In this paper we present the approach of introducing thesaurus knowledge into probabilistic topic models. The main idea of the approach is based on the assumption that the frequencies of semantically related words and phrases, which are met in the same texts, should be enhanced: this action leads to their larger contribution into topics found in these texts. We have conducted experiments with several thesauri and found that for improving topic models, it is useful to utilize domain-specific knowledge. If a general thesaurus, such asWordNet, is used, the thesaurus-based improvement of topic models can be achieved with excluding hyponymy relations in combined topic models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجره‌های هم‌پوشان

A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...

متن کامل

Domain Specific Knowledge-based Information Retrieval Model using Knowledge Reduction

Information is a meaningful collection of data. Information retrieval (IR) is an important tool for changing data to information. Of the three classical IR models (Boolean, Support Vector Machine, and Probabilistic), the Support Vector Machine (SVM) IR model is most widely used. But this model does not convey enough relevancies between a query and documents to produce effective results reflecti...

متن کامل

Combining Words and Speech Prosody for Automatic Topic Segmentation

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topic units. The approach combines hidden Markov models, statistical language models, and prosody-based decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach o...

متن کامل

Are Self-Assessments Reliable Indicators of Topic Knowledge?

Self-assessment of topic/task knowledge is a human metacognitive capacity that impacts information behavior, for example through selection of learning and search strategies. It is often used as a measure in experiments for evaluation of results and those measurements are taken to be generally reliable. We conducted a user study (n=40) to test this by constructing a concept-based topic knowledge...

متن کامل

DOMAIN-SPECIFIC KNOWLEDGE-BASED INFORMATION RETRIEVAL MODEL USING KNOWLEDGE REDUCTION By CHANGWOO YOON A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA

of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DOMAIN-SPECIFIC KNOWLEDGE-BASED INFORMATION RETRIEVAL MODEL USING KNOWLEDGE REDUCTION By Changwoo Yoon August 2005 Chair: Douglas D. Dankel II Major Department: Computer and Information Science and Engineering Information is a meaningful...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017